216 research outputs found

    Using R and Bioconductor for proteomics data analysis.

    Get PDF
    This review presents how R, the popular statistical environment and programming language, can be used in the frame of proteomics data analysis. A short introduction to R is given, with special emphasis on some of the features that make R and its add-on packages premium software for sound and reproducible data analysis. The reader is also advised on how to find relevant R software for proteomics. Several use cases are then presented, illustrating data input/output, quality control, quantitative proteomics and data analysis. Detailed code and additional links to extensive documentation are available in the freely available companion package RforProteomics. This article is part of a Special Issue entitled: Computational Proteomics in the Post-Identification Era. Guest Editors: Martin Eisenacher and Christian Stephan

    Revisiting the thorny issue of missing values in single-cell proteomics

    Full text link
    Missing values are a notable challenge when analysing mass spectrometry-based proteomics data. While the field is still actively debating on the best practices, the challenge increased with the emergence of mass spectrometry-based single-cell proteomics and the dramatic increase in missing values. A popular approach to deal with missing values is to perform imputation. Imputation has several drawbacks for which alternatives exist, but currently imputation is still a practical solution widely adopted in single-cell proteomics data analysis. This perspective discusses the advantages and drawbacks of imputation. We also highlight 5 main challenges linked to missing value management in single-cell proteomics. Future developments should aim to solve these challenges, whether it is through imputation or data modelling. The perspective concludes with recommendations for reporting missing values, for reporting methods that deal with missing values and for proper encoding of missing values.Comment: The code to reproduce the images presented in the manuscript is available in the Github repository: https://github.com/UCLouvain-CBIO/2023_scp_n

    Towards reproducible MSMS data preprocessing, quality control and quantification

    Get PDF
    The development of MSnbase aims at providing researchers dealing with labelled quantitative proteomics data with a transparent, portable, extensible and open-source collaborative framework to easily manipulate and analyse MS2-level raw tandem mass spectrometry data. The implementation in R gives users and developers a great variety of powerful tools to be used in a controlled and reproducible way. Furthermore, MSnbase has been developed following an object-oriented programming paradigm: all information that is manipulated by the user is encapsulated in ad hoc data containers to hide it's underlying complexity. We illustrate the usage and achievements of our software using a published spiked-in data set in which varying quantities of test proteins have been labelled with four different iTRAQ tags. In addition to providing raw MSMS data, MSnbase also stores meta-data and logs processing steps in the data object itself for optimal traceability. We provide graphics on how to inspect precursor data for quality control and how individual or merged MSMS spectra can subsequently be processed, plotted and extracted using a variety of methods. We also demonstrate how reporter ions (or any peaks of interest defined by the user) can easily be quantified and normalised using several build-in alternative strategies and how the effect of each transformation can be recorded, examined and reproduced. MSnbase constitutes a unique versatile working and development environment to process labelled MSMS data and provides in turn important feedback for data acquisition optimisation. We conclude by presenting future extensions of MSnbase and highlight its usage in reproducible proteomics research

    Assessing the Applicability of the GTR Nucleotide Substitution Model Through Simulations

    Get PDF
    The General Time Reversible (GTR) model of nucleotide substitution is at the core of many distance-based and character-based phylogeny inference methods. The procedure described by Waddell and Steel (1997), for estimating distances and instantaneous substitution rate matrices, R, under the GTR model, is known to be inapplicable under some conditions, ie, it leads to the inapplicability of the GTR model. Here, we simulate the evolution of DNA sequences along 12 trees characterized by different combinations of tree length, (non-)homogeneity of the substitution rate matrix R, and sequence length. We then evaluate both the frequency of the GTR model inapplicability for estimating distances and the accuracy of inferred alignments. Our results indicate that, inapplicability of the Waddel and Steel’s procedure can be considered a real practical issue, and illustrate that the probability of this inapplicability is a function of substitution rates and sequence length

    Mapping the sub-cellular proteome

    Get PDF
    In biology, localisation is function. Cells display a complex sub-cellular structure with numerous distinct niches responsible for specific biological processes. Consequently, not only must proteins be present in a cell to accomplish their biological functions, but they must be localised in their intended sub-cellular locations. In contrast, mis-localised proteins can have serious advert consequences. In this talk, I will present how contemporary experimental and computational technologies can be used to produce proteome-wide proteins localisation maps

    Accounting for the Multiple Natures of Missing Values in Label-Free Quantitative Proteomics Data Sets to Compare Imputation Strategies.

    Get PDF
    Missing values are a genuine issue in label-free quantitative proteomics. Recent works have surveyed the different statistical methods to conduct imputation and have compared them on real or simulated data sets and recommended a list of missing value imputation methods for proteomics application. Although insightful, these comparisons do not account for two important facts: (i) depending on the proteomics data set, the missingness mechanism may be of different natures and (ii) each imputation method is devoted to a specific type of missingness mechanism. As a result, we believe that the question at stake is not to find the most accurate imputation method in general but instead the most appropriate one. We describe a series of comparisons that support our views: For instance, we show that a supposedly "under-performing" method (i.e., giving baseline average results), if applied at the "appropriate" time in the data-processing pipeline (before or after peptide aggregation) on a data set with the "appropriate" nature of missing values, can outperform a blindly applied, supposedly "better-performing" method (i.e., the reference method from the state-of-the-art). This leads us to formulate few practical guidelines regarding the choice and the application of an imputation method in a proteomics context.his work was supported by the following funding: ANR-2010-GENOM-BTV-002-01 (Chloro-Types), ANR-10-INBS-08 (ProFI project, “Infrastructures Nationales en Biologie et Santé”, “Investissements d’Avenir”), EU FP7 program (Prime-XS project, Contract no. 262067), the Prospectom project (Mastodons 2012 CNRS challenge), and the BBSRC Strategic Longer and Larger grant (Award BB/L002817/1).This is the final version of the article. It first appeared from the American Chemical Society via https://dx.doi.org/10.1021/acs.jproteome.5b0098

    Effects of Traveling Wave Ion Mobility Separation on Data Independent Acquisition in Proteomics Studies

    Get PDF
    qTOF mass spectrometry and traveling wave ion mobility separation (TWIMS) hybrid instruments (q- TWIMS-TOF) have recently become commercially available. Ion mobility separation allows an additional dimension of precursor separation inside the instrument, without incurring an increase in instrument time. We comprehensively investigated the effects of TWIMS on data-independent acquisition on a Synapt G2 instrument. We observed that if fragmentation is performed post TWIMS, more accurate assignment of fragment ions to precursors is possible in data independent acquisition. This allows up to 60% higher proteome coverage and higher confidence of protein and peptide identifications. Moreover, the majority of peptides and proteins identified upon application of TWIMS span the lower intensity range of the proteome. It has also been demonstrated in several studies that employing IMS results in higher peak capacity of separation and consequently more accurate and precise quantitation of lower intensity precursor ions. We observe that employing TWIMS results in an attenuation of the detected ion current. We postulate that this effect is binary; sensitivity is reduced due to ion scattering during transfer into a high pressure “IMS zone”, sensitivity is reduced due to the saturation of detector digitizer as a result of the IMS concentration effect. This latter effect limits the useful linear range of quantitation, compromising quantitation accuracy of high intensity peptides. We demonstrate that the signal loss from detector saturation and transmission loss can be deconvoluted by investigation of the peptide isotopic envelope. We discuss the origin and extent of signal loss and suggest methods to minimize these effects on q-TWIMS-TOF instrument in the light of different experimental designs and other IMS/MS platforms described previously
    corecore